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Abstract 

Fuzzy logic and neural networks provide new methods for designing 
control systems. Fuzzy logic controllers do not require a complete an- 
alytical model of a dynamic system and can provide knowledge-based 
heuristic controllers for ill-defined and complex systems. Neural net- 
works can be used for learning control . In this chapter, we discuss 
hybrid methods using fuzzy logic and neural networks which can start 
with an approximate control knowledge base and refine it through re- 
inforcement learning. 


1. INTRODUCTION AND MOTIVATION 

What is the fundamental difference between Fuzzy Logic Controllers 
(FLCs) and those that are based on conventional control theory? How 
can FLCs learn and adaptively change their performance? These ques- 
tions are among the main questions that I will discuss in some details 
in this chapter. However, to briefly answer the first question, FLCs 
do not require a complete analytical model of a dynamic system and 
can provide knowledge-based heuristic controllers for ill-defined and 
complex systems. As for the second question above, in this chapter, 
we consider neural networks to provide learning capability for FLCs 
although other learning methods of artificial intelligence may also be 
used. 

This chapter is not intended to provide a complete survey on either 
FLCs or applications of neural networks in control, since other appro- 
priate surveys on these topics are already available (e.g., see Berenji 
[8], Sugeno [29], Barto [5], and Antsaklis [2]). However, in this chapter, 
we will first cover some basics of fuzzy set theory and their application 
in designing FLCs. Next we discuss some issues related to the stabil- 
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ity analysis of FLCs and some applications of this theory. Then we 
briefly describe the application of neural nets in control with a view 
toward a special family of techniques known as reinforcement learning. 
We then discuss how FLCs can learn from experience through rein- 
forcement learning. We conclude the chapter by listing a number of 
research directions for this field. 


2. FUZZY LOGIC FOR INTERPOLATIVE 
REASONING 

In his seminal work, Zadeh [42] devised the fuzzy set theory as an 
extension of the set theory. Non-fuzzy sets only allow full membership 
or no membership at all, where fuzzy sets allow partial membership. 
In other words, an element may partially belong to a set. This partial 
memberships can take values ranging from 0 to 1. Here, we review some 
basic concepts of fuzzy sets; however, see [17, 8] for more complete 
discussion. 

Assuming that A and B are two fuzzy sets with membership func- 
tions fiA and \ib respectively, then the complement of fuzzy set A is a 
fuzzy set A with membership function 

Va = 1 - 

Traditionally, in fuzzy logic, the union and the intersection of sets A 
and B are defined using Max and Min operators: 

VAUB = ma x{pa,Vb}- 


^AnB = mm{nA,VB}- 

However, the generalized family of these operators, known as trian- 
gular norms and triangular co-norms have also been extensively stud- 
ied in the past. Berenji, et al. [9] have studied a different generalized 
family of operators known as Ordered Weighted Averaging (OWA) op- 
erators and have applied it to control . The OWA operators introduced 
by Yager [40] for multi- criteria decision making provide a facility to 
implement various types of aggregation operators commonly used in 
fuzzy control. The OWA operators generalize the ordinary and and or 
functions used in rule-based control [40]. 

3. BASIC ARCHITECTURE OF FLC 

In the design of a fuzzy controller, one must identify the main control 
variables and determine a term set that is at the right level of granu- 
larity for describing the values of each variable. For example, a term 
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Figure 1: A simple architecture of a fuzzy logic controller. 


sensor measurement 



Figure 2: Matching a sensor reading x 0 with the membership function 
n(x) to get /i(x 0 ); (a) crisp sensor reading (b) fuzzy sensor reading. 


set including linguistic values such as { Small, Medium, Large} may 
be satisfactory in some domains; whereas other domains may instead 
require the use of a five term set such as { Very Small, Small, Medium, 
Large, and Very Large}. 

After the linguistic term sets for the main control variables are de- 
termined, a knowledge base is developed using these control variables 
and the values that they may take. If the knowledge base is a rule 
base, more than one rule may fire simultaneously; hence it requires 
a conflict resolution method for decision making, as will be described 
later. 

Figure 1 illustrates a simple architecture for a fuzzy logic controller. 
This architecture consists of four modules whose functions are de- 
scribed next. 

3.1 Encoder 

In coding the values from the sensors, one transforms the values of 
the sensor measurements in terms of the linguistic labels used in the 
preconditions of the rules. If the sensor reading has a crisp value, 
then the fuzzification stage requires matching the sensor measurement 
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against the membership function of the linguistic label as shown in 
Figure 2(a). If the sensor reading contains noise, it may be modeled by 
using a triangular membership function where the vertex of the triangle 
refers to the mean value of the data set of sensor measurements and 
the base refers to a function of the standard deviation (e.g., twice the 
standard deviation, as used in [39]). Then in this case, fuzzification 
refers to finding the intersection of the label’s membership function 
and the distribution for the sensed data as shown in Figure 2(b). 

3.2 Knowledge Base 

There are two main tasks in designing the control knowledge base. 
First, a set of linguistic variables must be selected which describe the 
values of the main control variables of the process. Both the main 
input variables and the main output variables must be linguistically 
defined in this stage using proper term sets. The selection of the level 
of granularity of a term set for an input variable or an output variable 
plays an important role in the smoothness of control. Secondly, a con- 
trol knowledge base must be developed which uses the above linguistic 
description of the main variables. Sugeno [29] has suggested four meth- 
ods for doing this: Expert’s Experience and Knowledge, Modeling the 
Operator’s Control Actions, Modeling a process, and Self Organiza- 
tion. 

Among these methods, the first method is the most widely used [21]. 
In modeling the human expert operator’s knowledge, fuzzy control 
rules of the form: 

IF Error is small and Change-in-error is small, Then force is small. 

have been used in studies such as [30]. This method is effective when 
expert human operators can express the heuristics or the knowledge 
that they use in controlling a process in terms of rules of the above 
form. Applications have been developed in process control (e.g., ce- 
ment kiln operations [15]). Beside the ordinary fuzzy control rules 
which have been used by Mamdani and others, where the conclusion 
of a rule is another fuzzy variable, a rule can be developed whereby 
its conclusion is a function of the input variables. For example, the 
following implication can be written: 

IF X is A\ and Y is B u Then Z =f x (X,Y) 
where the output Z is a function of the values that X and Y may take. 

The second method, directly models the control actions of the human 
operator. Takagi and Sugeno [35] and Sugeno and Murakami [30] have 
used this method for modeling the control actions of a driver in parking 
a car. 
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The third method deals with fuzzy modeling of a process where an 
approximate model of the plant is configured by using implications 
that describe the possible states of the system. In this method, a 
model is developed and a fuzzy controller is constructed to control the 
fuzzy model, making this approach similar to the traditional approach 
taken in control theory. Hence, structure identification and parameter 
identification processes are needed. For example, a rule discussed by 
Sugeno [29] is of the form: 

If xi is is A % m Then y = + p\x x + ... + p i m x m , 

for i — 1 where n is the number of such implications and the 

consequence is a linear function of the m input variables. 

Finally, the fourth method refers to the research of Mamdani and 
his students in developing self-organizing controllers [26]. The main 
idea in this method is the development of rules which can be adjusted 
over time to improve the controllers’ performance. 

3.3 Decision Making Logic 

As mentioned earlier, because of the partial matching attribute of 
fuzzy control rules and the fact that the preconditions of rules do over- 
lap, usually more than one fuzzy control rule can fire at a time. The 
methodology which is used in deciding what control action should be 
taken as the result of the firing of several rules can be referred to as 
conflict resolution . The following example, using two rules, illustrates 
this process. Assume that we have the following rules: 

Rule 1: IF X is A\ and Y is B x THEN Z is C t 
Rule 2: IF X is A 2 and Y is B 2 THEN Z is C 2 

Now, if we have xo and yo as the sensor readings for fuzzy variables X 
and Y, then for Rule 1, their truth values are represented by x 0 ) 
and where fiA x and jib 1 represent the membership function 

for A\ and i?i, respectively. Similarly for Rule 2, we have pa 2 ( x o) 
and VB 2 (yo) as the truth values of the preconditions. Assuming that 
a minimum operator is used as the conjunction operator, the strength 
of Rule 1 can be calculated by: 

w(l) = min(n Al (x 0 ),fiB 1 {yo))- 

Similarly for Rule 2: 


w(2) = min(iJ, A2 (xo),HB 2 (yo))- 
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The control output of Rule 1 is calculated by applying the matching 
strength of its preconditions to its conclusion 1 : 

*(i) = m7 1 1 (Mi)), 


and for Rule 2: 

*( 2 ) = H ^ 0 ( 2 )). 

This means that as a result of reading sensor values a?o and yo, Rule 
1 is recommending control action z(l) and Rule 2 is recommending 
control action z( 2). 

3.4 Decoder 

Also known as Defuzzifier, the decoder produces a nonfuzzy control ac- 
tion that best represents the membership function of an inferred fuzzy 
control action as a result of combining several rules. Several defuzzi- 
fication methods such as center of area (COA) and mean of maxima 
(MOM) have been suggested. The COA method calculates the center 
of the area resulted from superimposing the conclusions of the firing 
rules, and the MOM method averages out the values for which the 
membership of the combined membership function reaches the maxi- 
mum. These methods are reviewed in [8]. In the example discussed 
above and shown in Figure 3, the combination of the rules produces 
a nonfuzzy control action z* which is calculated using Tsukomoto’s 
defuzzification method: 


. EL. 

£?=i »(0 

where n is the number of rules with firing strength, w(i), greater than 
0 (n = 2 in the above example) and z(i ) is the amount of control action 
recommended by rule i . 


3.5 Hierarchical Fuzzy Control 

Berenji, et al. [11] have proposed the following algorithm for the design 
of fuzzy controllers with multiple goals. 

1. Here, it should be noted that the inverse functions can only be defined for mono- 
tonic membership functions. Since most fuzzy membership functions are defined 
using non-monotonic functions, other mapping functions have been used in the 
literature, which are reviewed in [8], For simplicity, we explain the mapping 
and defuzzification processes using monotonic functions only, although other 
approaches (also reviewed in [8]) are equally applicable. 
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rule-1 


RULE-2 



z *_ z(t)w(1) + z(2)w(2) 

w(1) + w(2) 


Figure 3: Defuzzification of the combined conclusion of rules using 
Tsukamoto’s monotonic membership functions. 
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1. Let G — {^ 1 ,^ 2 ? ---dri} be the set of goals to be achieved and main- 
tained. Notice that for n — 1 (i.e., no interacting goals), the prob- 
lem becomes simpler and may be handled using the earlier methods 
in fuzzy control (e.g., see [21]). 

2. Let G ' = p(G) where p is a function that assigns priorities among 
the goals and G f is the priorotized list of goals in G. We assume 
that such a function can be obtained in the given domain. In many 
control problems, it is possible to specifically assign priorities to the 
goals. For example, in the simple problem of balancing a pole on 
the palm of a hand and also moving the pole to a pre-determined 
location, it is possible to do this by first keeping the pole as verti- 
cal as possible and then gradually moving to the desired location. 
Although these goals are highly interactive (i.e., as soon as we no- 
tice that the pole is falling, we may temporarily set aside the other 
goal of moving to the desired location), we still can assign priorities 
fairly well. 

3. Let U = {wi, ^ 2 , u n } where Ui is the set of input control variables 
related to achieving 

4. Let A — {ai,a 2 , ...,a n } where a{ is the set of linguistic values used 
to describe the values of the input control variables in 

5. Let C = {ci,C 2 , ...,c n } where c, is the set of linguistic values used 
to describe the values of output control variables. 

6. Acquire the rule set R\ of approximate control rules directly related 
to the highest priority goal. These rules are in the general form of 

IF ui is ai THEN Z is c\. 

7. For i — 2 to rc, form the rule sets i? t . The format of the rules in 
these rule sets is similar to the ones in the previous step except that 
they include aspects of approximately achieving the previous goal : 

IF g' { _ x is approximately achieved and ui is a, THEN Z is c t . 

The approximate achievement of a goal in step 7 of the above algo- 
rithm refers to holding the goal parameters within smaller boundaries. 
The interactions among the goal g\ and goal g[_ x are handled by form- 
ing rules which include more preconditions in the left hand side. For 
example, let us assume that we have acquired a set of rules Ri for 
keeping a pole vertical. In writing the second rule set R 2 for moving to 
a pre-specified location, aspects of approximately achieving g[ should 
be combined with control parameters for achieving g 2 . For example, 
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a precondition such as the pole is almost balanced can be added while 
writing the rules for moving to a specific location. A fuzzy set oper- 
ation known as concentration [43] can be used here to systematically 
obtain a more focused membership functions for the parameters which 
represent the achievement of previous goals. The above algorithm has 
been applied in cart-pole balancing and more details can be found in 

in]. 


3.6 Stability Analysis 

Stability analysis of fuzzy logic controllers is an important issue for 
which only a limited number of studies have been performed. For 
example, Braae and Rutherford [13] used control rules as transition 
functions between the states of the system in terms of a generalized 
state space. Kiszka, et al. [16] have provided an energistic approach 
to the analysis of stability and robustness of FLCs, where an energy 
function can be found which consistently decreases along a solution 
trajectory. This approach is similar to Chen’s approach [14] in using 
the concept of cell to cell mapping, but Chen uses a Lyapunov based 
approach. Finally, Langari [18] provides a stability analysis for FLCs 
under the assumption that plant structure with unknown but bounded 
parameters is available; however, some assumptions are placed on plant 
dynamics. For further analysis of stability in FLC, refer to Langari and 
Berenji [19]. 

3.7 Applications of Fuzzy Logic Controllers 

Mamdani and Assilian [21] were the first to apply fuzzy set theory 
to control problems (e.g., the control of a laboratory steam engine). 
This experiment triggered some other applications, and in recent years 
there has been a very significant increase in the number of applica- 
tions of fuzzy logic control. Currently, there are numerous products on 
the market which use fuzzy logic control (mostly designed in Japan); 
Berenji [8] reviews a number of these applications. In the following, 
we discuss a few of these systems in more detail. 

3.7.1 Automatic train control 

Yasunobu and Miyamoto at Hitachi, Ltd. [41] have designed a fuzzy 
controller for the Automatic Train Operation (ATO) system which 
has been in use in the city of Sendai, Japan since July 1987. The two 
main operations of the system are Constant Speed Control (CSC) and 
Train Automatic Stop Control (TASC). The CSC operation results 
in maintaining a constant target speed (specified by the operator at 
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the start of the train operation) during the train travel. The TASC 
operation controls the speed of the train in order to stop the train 
at the prespecified location. The system uses only a few rules (i.e., 12 
rules for each of the CSC and the TASC operations), and the control is 
evaluated every 100 milliseconds. These operations use the evaluation 
of safety, riding comfort, traceability of target velocity, accuracy of 
stop gap, running time, and energy consumption criteria in deciding a 
control strategy. The control rules are of the predictive fuzzy form: 

If ( u is Ci — > x is Ai and y is Bi) Then u is C*. 

For example, when the train is in the TASC zone, the following rule 
is used: 

If the control notch is not changed and 

the train will stop at the predetermined location, then 

the control notch is not changed. 

The system performs as skillfully as human experts do and superior 
to an ordinary PID 2 automatic train operation controller in terms of 
stopping precision, energy consumption, riding comfort, and running 
time. 

3.7.2 Sugeno’s model helicopter 

Sugeno has initiated several projects on applying fuzzy logic to the 
control of a model helicopter. Among these are radio control by oral 
instructions, automatic autorotation entry in engine failure cases, and 
unmanned helicopter control for sea rescue [28]. Although these projects 
have just started, several interesting results have already been achieved. 
The input variables from the helicopter include pitch, roll, and yaw, 
and their first and second derivatives. The control rules written for the 
helicopter regulate the up/down, forward/backward, left/right, and 
nose direction. For example, the longitudinal stick controls pitch and, 
therefore, forward/backward movement of the rotor craft. 

An example of a fuzzy control rule for hovering is the following: 

If the body rolls, then 
control the lateral in reverse. 

Or as another example for hovering control: 

If the body pitches, then 
control the longitude in reverse. 


2. Proportional, Integral, and Derivative. 
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The helicopter is inherently unstable and the helicopter control prob- 
lems under study in the above projects are challenging control prob- 
lems. These studies have already produced results which illustrate the 
strength of the fuzzy logic control technology. 

4. NEURAL NETWORKS FOR LEARNING 
CONTROL 

Connectionist learning approaches [5] can be used in learning control. 
Here, we can distinguish three classes: supervised learning , reinforce- 
ment learning , and unsupervised learning . In supervised learning, at 
each time step, a teacher provides the desired control objective to the 
learning system. In reinforcement learning, the teacher’s response is 
not as direct and informative as in supervised learning and it serves 
more to evaluate the state of the system. In unsupervised learning, 
the presence of a teacher or a supervisor to provide the correct control 
response is not assumed. 

If supervised learning can be used in control (e.g., when the input- 
output training data is available), it has been shown that it is more 
efficient than reinforcement learning (e.g., faster learning [6, 1]). How- 
ever, many control problems require selecting control actions whose 
consequences emerge over uncertain periods for which input-output 
training data are not readily available. In such domains, reinforcement 
learning techniques are more appropriate than supervised learning. 


4.1 Reinforcement Learning in Control 

As mentioned earlier, in reinforcement learning, one assumes that there 
is no supervisor to critically judge the chosen control action at each 
time step. The learning system is told indirectly about the effect of its 
chosen control action. The study of reinforcement learning is related to 
the credit assignment problem where, given the performance (results) 
of a process, one has to distribute reward or blame to the individ- 
ual elements contributing to that performance. In rule-based systems, 
for example, this means assigning credit or blame to individual rules 
engaged in the problem solving process. Samuel’s checker- playing pro- 
gram is probably the earliest AI program which used this idea [27]. 
Michie and Chambers [23] used a reward-punishment strategy in their 
BOXES system, which learned to do cart-pole balancing by discretiz- 
ing the state space into non-overlapping regions (boxes) and applying 
two opposite constant forces. Barto, Sutton, and Anderson [4] used 
two neuron-like elements to solve the learning problem in cart-pole 
balancing. In these approaches, the state space is partitioned into 
non-overlapping regions and then the credit assignment is performed 
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on a local basis. 

Reinforcement learning has its roots in studies of animal learning and 
psychological research on human behavior (e.g., [3]). It directly relates 
to the theory of learning automata originated by the work of Tsetlin 
[36] and further developed by the work of Narendra and Thathachar 
[25], Narendra and Lakshmivarahan [24], and Mendel and McLaren [22] 
in control engineering. Since reinforcement learning techniques can be 
used without an explicit teacher or supervisor, they construct an inter- 
nal evaluator, or a critic , capable of evaluating the dynamic system’s 
performance. How to construct this critic so that it can properly eval- 
uate the performance in a way that is useful to the control objective, is 
itself a significant problem in reinforcement learning. Given the evalu- 
ation by the critic, the other problem in reinforcement learning is how 
to adjust the control signal. Barto [5] discusses several approaches 
to this problem based on the gradient of the critic’s evaluation as a 
function of control signals. 

4.1.1 Temporal Difference methods 

Related to reinforcement learning are the Temporal Difference (TD) 
methods, a class of incremental learning procedures specialized for pre- 
diction problems, which were introduced by Sutton [32]. The main 
characteristic of these methods is that they learn from successive pre- 
dictions; whereas, in the case of supervised learning methods, learning 
occurs when the difference between the predicted outcome and the 
actual outcome is revealed (i.e., the learning model in TD does not 
have to wait until the actual outcome is known and can update its pa- 
rameters within a trial period). The difference between the TD meth- 
ods and the supervised learning methods becomes clear when these 
methods are distinguished as single-step versus multi-step prediction 
problems. In the single-step prediction (e.g., Widrow-Hoff rule [38]), 
complete information regarding the correctness of a prediction is re- 
vealed at once. Whereas in multi-step predicvtion, this information 
is not revealed until more than one step after the prediction is made; 
however, partial information becomes available at each step. Barto, et 
al. [7] have recently shown a stronger relation between a specific class 
of these methods, called TD algorithm , and dynamic programming. 

4.1.2 Q- Learning 

One of the most promising techniques in reinforcement learning relates 
to Q-Leaming as developed by Watkins [37]. In Q-Learning, a real- 
valued function, Q, of states and actions is estimated. For example, 
Q(x,a) represents the expected discounted sum of future reward for 
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performing action a in state x and performing optimally thereafter. 
The Q-Learning algorithm keeps an estimate of Q, updating this esti- 
mate at each time step by observing the reinforcement received at that 
time step, applying the selected action, and observing the state of the 
system at the next time step. Q-Learning shows a strong relationship 
between reinforcement learning and on-line dynamic programming [33]. 

5. HYBRID NEURAL AND FUZZY CONTROLLERS 

Neural and fuzzy controllers are similar in that they both allow inter- 
polation. For example, once a neural network has been trained for a 
set of data, it can interpolate and produce answers for the cases not 
present in the data set. 

The main idea in integrating the fuzzy logic controllers with neural 
networks is to use the strength of each one collectively in the resulting 
neuro-fuzzy control system. This fusion allows: 

1. A human understandable expression of the knowledge used in con- 
trol in terms of the fuzzy control rules. This reduces the difficulties 
in describing the trained neural network which is usually treated 
as a black box. 

2. The fuzzy controller learns to adjust its performance automatically 
using a neural network structure and hence learns by accumulating 
experience. 

The emphasis of research on hybrid neural and fuzzy controllers has 
been on automatic design and refinement of the membership functions 
(e.g., see [20, 34]). In this section, we discuss how reinforcement learn- 
ing techniques can be used in refining fuzzy membership functions. 


5.1 Approximate Reasoning-based Intelligent Control 
(ARIC) 

The ARIC architecture, introduced by Berenji [10], provides adap- 
tive learning capability for fuzzy logic controllers. Figure 4 illustrates 
the ARIC architecture which contains three main elements: 

• The Action-state Evaluation Network (AEN), which acts as a critic 
and provides advice to the main controller. 

• The Action Selection Network (ASN) which includes a fuzzy logic 
controller. 

• The Stochastic Action Modifier (SAM), which changes the action 
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Figure 4: The ARIC Architecture. 


recommended by the ASN based on internal reinforcement. The 
modified action is then sent to the plant. 

The AEN is based on Anderson’s [1] extension to Sutton’s [31] AHC 
algorithm in which the single-layer neural network in AHC was ex- 
tended to a multi-layer neural network. The ASN is a multi-layer 
neural network representation of a fuzzy logic controller, with as many 
hidden units as there are rules in the control knowledge base. The 
inputs to a hidden unit are the preconditions of a rule and its out- 
put is the conclusion of a rule. The ASN learns search heuristics as a 
probabilistic mapping from states to actions. The SAM stochastically 
modifies the action selected by the fuzzy controller. This change in the 
recommended action by ASN is more significant for a state if that state 
does not receive high internal reinforcements (i.e., probability of failure 
is high). On the other hand, if a state receives high reinforcements, 
SAM changes the action selected by the fuzzy controller by smaller 
magnitudes. This means that if the fuzzy logic controller embedded in 
ASN is performing well (e.g., after it has learned to control the system), 
then its recommendation is followed with no or only minor changes to 
it. However, when a state receives weak internal reinforcement (e.g., 
at the beginning of the learning process), SAM modifies the action 
recommended by the fuzzy controller more significantly. The details of 
this process are discussed in [10]; however, it should be noted that the 
learning element of ARIC’s architecture is similar to learning skills in 
humans who start with a collection of general rules (e.g., fuzzy control 
rules in ARIC) and refine them through practice (e.g., reinforcement 
learning through a number of trials). 

In summary, the ARIC’s algorithm proceeds as follows. 
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1. Given an input, the ASN 

• determines an action using fuzzy rules 

• determines a measure of confidence in the fuzzy system’s con- 
clusion. 

2. These two are appropriately used to produce the final action, which 
is sent to the physical system. As a result of this action, the system 
moves to a new state. 

3. The AEN evaluates the new state, and a comparison of this evalua- 
tion with the score of the previous state gives a measure of internal 
reinforcement. 

4. This reinforcement controls the modification of weights in both 
AEN and ASN, which leads to learning. 

5. Over time, the AEN improves and becomes a good state evaluator, 
and reinforcement estimates become more reliable. Also, the ASN 
improves so that the recommended control actions of the fuzzy 
system become more correct with higher measures of confidence. 

The AEN and ASN do not necessarily have the same number of 
hidden or output units. For example, as shown in Figure 5, in cart- 
pole balancing experiment, 5 and 13 units are used in the hidden layers 
of the AEN and ASN, respectively. 

5.2 GARIC: Generalized ARIC 

Berenji and Khedkar [12] have extended ARIC in many respects 
including the following: 

• Learning is achieved by full integration of fuzzy inference into a 
feedforward network, which can then adaptively improve perfor- 
mance by using gradient descent methods. 

• The fuzzy memberships used in the definition of the labels are mod- 
ified (tuned) globally in all the rules, rather than being locally mod- 
ified in each individual rule. 

• GARIC can compensate for inappropriate definitions of fuzzy mem- 
bership functions in the antecedent of control rules. GARIC is the 
first architecture to do this. 

• GARIC introduces a new localized mean of maximum (LMOM) 
method for combining the conclusions of several firing control rules. 
This approach basically defuzzifies each rule individually and then 
combines the resulting defuzzified values from all the firing rules. 
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Action- Stata Evaluation Nat work 



Figure 5: ARIC applied to cart pole balancing. 
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Figure 6: The Architecture of GARIC. 

• Only monotonic membership functions are used in ARIC; whereas, 
GARIC allows any type of differentiable membership functions to 
be used in constructing fuzzy logic controllers. 

The architecture of GARIC, which is schematically shown in Figure 
6, has three components: 

• The Action Selection Network maps a state vector into a recom- 
mended action F, using fuzzy inference. 

• The Action Evaluation Network maps a state vector and a failure 
signal into a scalar score which indicates state goodness. This is 
also used to produce internal reinforcement r . 

• The Stochastic Action Modifier uses both F and f to produce an 
action F f which is applied to the plant. 

The ensuing state is fed back to the controller along with a boolean 
failure signal. Learning occurs by fine-tuning the free parameters in 
the two networks : in the AEN, the weights are adjusted and in the 
ASN, the parameters describing the fuzzy membership functions are 
changed. 

Figure 7 presents the GARIC architecture as it is applied to cart- 
pole balancing. The AEN network has 4 input units, a bias input unit, 
5 hidden units, and an output unit. 
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The GARIC architecture proposes a new way of designing and tun- 
ing a fuzzy logic controller. The knowledge used by an experienced 
operator in controlling a process can now be modeled using approxi- 
mate linguistic terms and later refined through the process of learning 
from experience. GARIC provides a well-balanced method for combin- 
ing the qualitative knowledge of human experts (in terms of symbolic 
rules) and the learning strength of the artificial neural networks. The 
architecture is general enough for use in other rule-based systems (be- 
sides controllers) which perform fuzzy logic inference. 


6. CONCLUSION 

Fuzzy logic controllers can use much of the heuristic knowledge of ex- 
perienced human operators in controlling a process. In this chapter, we 
discussed how these controllers can provide alternative methods for the 
design of controllers for complex systems while requiring no analytical 
model of the process. We showed how neural networks can be effec- 
tively used to provide learning capabilities for these controllers. We 
discussed two such hybrid methods which use reinforcement learning. 
Architectures such as ARIC and GARIC, as discussed in this chapter, 
provide hybrid methods for combining the knowledge representation 
strength of fuzzy inference systems with the adaptive learning capabil- 
ity of neural networks. 

For fuzzy logic controllers, structure identification issues, such as the 
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number of rules in the rule base, have to be resolved. Unsupervised 
learning and clustering methods are definitely useful here. FLCs can 
also use neural network methods in which a network is grown (i.e., 
new nodes are added to the network) based on the system’s learn- 
ing behavior. Parameter identification issues, such as the shape of 
membership functions, can be resolved using methods such as ARIC 
and GARIC. On the other hand, stability analysis of fuzzy logic con- 
trollers requires more general treatment than the limited classes which 
have already been studied. For neural networks, fuzzy logic provides 
a unique knowledge representation power for both including the prior 
knowledge in the network and explaining the results of learning. Here, 
issues such as the speed of learning, convergence, and stability of the 
neural networks for control need to be resolved. 
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